Revisiting animal photo?identification using deep metric learning and network analysis
نویسندگان
چکیده
In many respects, population and behavioural ecology have immensely benefited from individual-based, long-term monitoring of animals in wild populations (Clutton-Brock & Sheldon, 2010; Hayes Schradin, 2017). At the heart such is ability to recognize individuals. Individual identification often achieved by actively marking animals, as deploying ear-tags or leg rings, cutting fingers feathers, scratching scales reptiles (Silvy et al., 2005). some species, however, individuals display natural marks that make them uniquely identifiable. For instance, large African mammals leopard Panthera pardus, zebra Equus sp., kudu Tragelaphus strepsiceros, wildebeest Connochaetes taurinus giraffe Giraffa camelopardalis, all present idiosyncratic fur coat patterns. Non-invasive reliable has long been known be feasible comparisons these distinctive patterns (Estes, 1991). As number identify increases, people-based visual pictures can rapidly become overwhelming. With recent move digital technologies (namely cameras camera traps), problem becomes even more acute process easily reach thousands ten thousands. Over last decade, use computer vision spread into biological sciences a standard tool animal for repetitive tasks (Weinstein, 2018). seminal publication, Bolger al. (2012) first presented computer-aided photo-identification, initially giraffes but recently applied dolphins (Renó 2019). The underlying technique feature matching algorithm, Scale Invariant Feature Transform operator (SIFT; Lowe, 2004), where each image associated with k-nearest best matches. current SIFT ecologists requires human intervention validate proposed candidate images within graphical interface (Bolger 2011). same vein, other feature-based proposals were developed decade apply different types idiosyncrasies (Hartog Reijns, 2014; Moya 2015). A drawback method frequently arises when two are considered similar not because skin similarities backgrounds (presence tree instance), hence leading false-positive results vision, should cropped before so only relevant part appears analysed compared (e.g. excluding most neck, head, legs background herbivores). Until now, this cropping operation was done manually (Halloran 2015), despite being highly time-consuming task processing images. Meanwhile, Deep Learning (DL) revolution underway showing breakthrough performance improvements (Christin particular, convolutional neural networks (CNNs) now front-line deal range questions environmental (Lamba Many studies tackle general re-identification using CNNs, which mostly extensively used humans (Wu Technically, consists CNN classify individuals, necessarily seen before, is, unknown However, availability proven efficient techniques (Zheng 2016), several successful attempts non-human species (Bogucki 2019; Bouma Chen 2020; Ferreira Hansen 2018; He Körschens Moskvyak Schneider Schofield 2019), remains challenging re-observations limited train model satisfactorily sensu largo (Schneider practice, CNN-based approaches tailored needs field interested tools individual recognition. batches new regularly added reference database following yearly fieldwork sessions recruitment newborns immigrants if study demographically open. Therefore, we expect re-sighting well observation never before. words, sampling design implies solve mixture (2020) referred ‘open set’ problem, they assign single ‘unknown’ label. Automatically identifying currently speeds up picture sorting process, facilitates adding whose life history monitored. classical classifier re-identify already (usually softmax layer) will fail predicted classes must match We therefore crucially need approach filter out at time analysis. propose rely on deep metric learning (DML, see Hoffer Ailon, 2015) an ideal problem. DML training embed input data (input images) multidimensional Euclidean space common class given individual) are, terms distance, much closer than rest data. Here, addressed photo-identification updated, open-source end-to-end automatic pipeline case iconic, endangered giraffe. step, state-of-the art object detection CNNs (Lin 2017) automatically crop flanks about 4,000 raw photographs shot Hwange National Park, Zimbabwe. Indeed, clearly outperformed (Girshick 2014), including Histogram Oriented Gradients (HOG) too (Buehler Second, (2012), calculate numeric distance between pairs flanks. From n × calculated distances, followed framework similarity network (Wang 2018) unsupervised retrieve clusters coming removing any identification. Third, validated subset our build ground-truth dataset (n = 82). Using set, supervised strategy evaluated its predictive accuracy cross-validation procedure. carried northeast Park (HNP), HNP park covers 14,650 km2 area (Chamaillé-Jammes 2009). sub-species could either G. c. angolensis giraffa according IUCN (Muller regular conducted 2014 2018. Each year least three consecutive weeks, drove road daily <60 km Main Camp station, took every encountered. Pictures taken 200–300 mm lenses mounted Nikon DSRL (sensor resolution ranged 16 40 Mpx). When taking field, burst mode set producing sequences very second. sequences, retained one photograph per sequence yielding total 3,940 photographs. available (Parham Sadegh Norouzzadeh Among options YOLO (Bochkovskiy Redmon 2016) Mask R-CNN (He 2017), RetinaNet detector able detect series predefined species) returns coordinates bounding box around objects, confidence score well. These steps performed CNN, makes fast one-stage opposed two-stage detectors searches regions containing potential object, second classifies (Redmon 2016). Moreover, allows better management non-informative objects' Finally, it heterogeneous (various positions, backgrounds, scale lighting), (Beery 2018), augmentation (flipping, rotation colour changes photographs) enhance performance. classification trained huge amount > millions capture discriminant features class. Because hand, relied transfer (Shin Transfer specific aiming small do no start ‘from scratch’ random parameters, uses parameters previously interest (Willi This works pre-trained learnt wide generic features. prepared boxes flanks, background, labelImg program annotation (https://github.com/tzutalin/labelImg). class, flank, shipped RetinaNet, ResNet50 backbone COCO (80 objects among few species; Lin (2014). 30 epochs 100 size 2. Our based Keras implementation https://github.com/fizyr/keras-retinanet. built achieve pattern (Lowe, commonly (Bellavia Colombo, 2020). algorithm extracts characteristic called key points invariant respect orientation. Comparing photographs, (i.e. having characteristics) retrieved ranked respective their vectors. selected 25 closest points. results, had assess extent consistent location matched body. To find cases actual matches patterns, superimposed extracted pair geometrical transformation homography. An homography perspective planes, image, finds close possible those image. preserves relative positioning perspective, Once images, plane compute obtaining SIFT-based distance. openCV library version 3.4 (Bradski, 2000). Following computation distances obtained approach, searched flank come defined made nodes representing edges: connected edge, came paired felt below threshold (see details). so-called components associate estimated value advantage property complex explosive percolation (Achlioptas predicts phase transition just above point. point, edges network, example slightly increasing (Hayasaka, leads sudden appearance giant component encompassing majority nodes. increase considering almost determined graphically, selecting point starts dramatically (Supporting Information Figure S2). additional issue arose erroneously (example S1), similar. cases, body overlap photograph. situation, might linked edges, actually consider giraffes. clustering community detection, science (Fortunato, 2010), split—only relevant—any groups significantly themselves others, community. presence inside group suggested individual, whereas absence informed inconsistency heterogeneity individuals). InfoMap (Rosvall Bergstrom, 2008). final product corresponding InfoMap. principle optimal way project machine tasks. context, triplet loss (Hermans line (Bouma relies triplets composed anchor another positive (same here) third negative (any giraffe; 2019, step optimizing computed layer (hereafter distance) minimal while maximizing counterpart. improved semi-hard (Schroff deals (in ‘hard’ cases), TripletSemiHardLoss function TensorFlow Addons. After completion, again vector composing model. derived test datasets required identified algorithm. fulfilling conditions: (a) cluster contains minimum 1 hr apart; (b) divided enough perform (we imposed five images), sequences; (c) demonstrated perfect verified consistency. training, independent condition ensured complete independence datasets, under conditions (time, season location). upmost importance errors would lead sub-optimal performances approach. carefully checked, manually, perfectly unambiguous. high level quality discarding overlapped frame, indifferently oriented back front (orientation ambiguities). focus central keeping 80% original width 60% height particular neck background). By doing so, wanted prevent capturing noise. Additionally, homogenized contrast normalizing channels imagemagick package (normalize option; https://imagemagick.org). resized 224 pixels. ended least, median seven (Table 1) set. particularly low led us framework, problems training. implemented 10-fold procedure extensive imgaug Python (https://github.com/aleju/imgaug). dataset, transformations modifying orientation size, blur, performing edge Gaussian noise colours brightness (details code). finally 11 model, modified versions Nb. indiv. quantify overall learning, replicated 10 times. randomly 25% and, purpose evaluation here, Then, them, above). aimed remaining 75% CNN. kept good data, thanks (at least) lag observations. selection completed, ResNetV2 readily Keras. augmented 80 42. stochastic gradient descent optimizer rate 0.2. 2.3.0. mimic se, literally re-seeing ‘reference book’ representative individuals: drawn dataset. then essence, expected ones individual. Similarly, also threshold. stringency arbitrarily varied 0 1. quantified values. First, Top-1 consisting checking query smallest image) following, true-positive (TP) rate. (FP), sort Again, over values, checked If not, successfully detected computing true-negative (TN) 400 469 manually. Training approximately min Titan X card. applying 1a), 5,019 boxes, supposed contain 2a). failed 186 (failure rate: 4.7%), due foreground vegetation unusual difficult examples 1b). bodies overlapping giraffes, partially rare instances, standing photograph, situation retrieving exact boundaries worst experienced, blurry 2b). Running 2004) compare 800 CPU hours resources. Section 2) 340 S2a), 11,249 1,417 781 singletons network-based relying (different 3). distribution definition concentrated after S3) maximal 35 instead 373. artefact chain overlaps, split S4). 316 5 105 found component, posteriori check consistencies comparing (such fair performance, saved 82 human-validated, unambiguous contained 1-hr interval 2). Those 822 learning. (Figure 4), returned (TP rate) 85% average 5) repeatedly impossible bad conspicuous disturbing elements forefront S6). Without problematic 90%, average. Interestingly, 5). existed (here most), always correct S5a). projected space. space, distant prediction partly supported only. If, values (d ? 0.1), TN 95%, decreased markedly time, started TP < 70% 0.1) levelled off increased Hence, unexpected S5b). 0.25; crossing both rates reached offered compromise. complementary field. Based networks, goes further previous solutions literature since end comprehensive list method, achieves rather Image proves cascade occurs, erroneous difficulties case, mixed. show labelled needed (a hundreds) what sites ‘Terra Incognita’, quoting Beery open question. Nevertheless, fine-tuning researchers dealing code provide. Further perspectives arise contour segmentation methods extract contours whole creating mask (Brodrick Giraffe contouring possibly help residual noise, building hundreds effort. recast statistical one, namely network. efficiently well-known 2020) (2012). useful False-positive recurrent occurring background. 3 4 3), frequent configuration faced node 2 latter li
منابع مشابه
Deep Metric Learning Using Triplet Network
Deep learning has proven itself as a successful set of models for learning useful semantic representations of data. These, however, are mostly implicitly learned as part of a classification task. In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons. A similar model was defined by Wang et al. (2014), tailor made for learning a ran...
متن کاملSkin Lesion Analysis towards Melanoma Detection Using Deep Learning Network
Skin lesions are a severe disease globally. Early detection of melanoma in dermoscopy images significantly increases the survival rate. However, the accurate recognition of melanoma is extremely challenging due to the following reasons: low contrast between lesions and skin, visual similarity between melanoma and non-melanoma lesions, etc. Hence, reliable automatic detection of skin tumors is v...
متن کاملIdentifying Style of 3D Shapes using Deep Metric Learning
We present a method that expands on previous work in learning human perceived style similarity across objects with different structures and functionalities. Unlike previous approaches that tackle this problem with the help of hand-crafted geometric descriptors, we make use of recent advances in metric learning with neural networks (deep metric learning). This allows us to train the similarity m...
متن کاملthe relationship between using language learning strategies, learners’ optimism, educational status, duration of learning and demotivation
with the growth of more humanistic approaches towards teaching foreign languages, more emphasis has been put on learners’ feelings, emotions and individual differences. one of the issues in teaching and learning english as a foreign language is demotivation. the purpose of this study was to investigate the relationship between the components of language learning strategies, optimism, duration o...
15 صفحه اولSimulate Congestion Prediction in a Wireless Network Using the LSTM Deep Learning Model
Achieved wireless networks since its beginning the prevalent wide due to the increasing wireless devices represented by smart phones and laptop, and the proliferation of networks coincides with the high speed and ease of use of the Internet and enjoy the delivery of various data such as video clips and games. Here's the show the congestion problem arises and represent aim of the research is t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Methods in Ecology and Evolution
سال: 2021
ISSN: ['2041-210X']
DOI: https://doi.org/10.1111/2041-210x.13577